Chapter 4: Research Design

Testing a hypothesis

Cross tabulation

Assuming we have a categorical independent variable (IV) and a categorical dependent variable (DV):

iv dv
HIGH No
HIGH No
LOW No
HIGH Yes
HIGH Yes
HIGH Yes
HIGH Yes
LOW Yes
LOW Yes
LOW Yes

Cross tabulation: Step 1

Start by calculating the number of observations with each value of each category:

iv dv
HIGH No
HIGH No
LOW No
HIGH Yes
HIGH Yes
HIGH Yes
HIGH Yes
LOW Yes
LOW Yes
LOW Yes
iv
dv LOW HIGH
No 1 2
Yes 3 4
Total 4 6

Cross tabulation: Step 2

Then, calculate the proportion/percentage of observations among each value of the IV.

If the independent variable is in the columns, then the columns should sum to 100%.

If the independent variable is in the rows, then the rows should sum to 100%.

iv
dv LOW HIGH
No 1 2
Yes 3 4
Total 4 6
iv
dv LOW HIGH
No 1 (25%) 2 (33%)
Yes 3 (75%) 4 (67%)
Total 4 6

Cross tabulation: interpretation

Look at what happens to the DV at different values of the IV. If your variables are ordinal, you should be able to identify a direction of the effect.

The proportion of “Yes” values decreases as the IV goes from lower to higher, so this is a negative or inverse relationship.

iv
dv LOW HIGH
No 1 (25%) 2 (33%)
Yes 3 (75%) 4 (67%)
Total 4 6

Using a bar graph or line graph can make these relationships easier to spot:

Cross tabulation: notes

  • Key rule: always calculate percentages or proportions by categories of the independent variable.

    • This allows you to compare groups that are different sizes.
  • If one or both variables are interval-level, you can bin them in order to use them in a cross tab. For instance, you could separate an interval like into a series of age ranges.

Cross tabulation: example

Hypothesis: in a comparison of individuals, independents are less likely to turn out to vote compared to people who support one party or another.

How should I calculate proportions here?

Voter Turnout in 2020 by party ID
Party ID
turnout2020 Democrat Independent Republican
0. Did not vote 335 316 382
1. Voted 3160 560 2714

Cross tabulation: example

Are these results generally consistent with my hypothesis?

Voter Turnout in 2020 by party ID
Party ID
turnout2020 Democrat Independent Republican
0. Did not vote 335 (10%) 316 (36%) 382 (12%)
1. Voted 3160 (90%) 560 (64%) 2714 (88%)

If we think of party ID as an ordered variable, this is a curvilinear relationship.

Row/Column percentages

What happens if I calculate % among the values of the DV?

Here’s the relationship between education and voter turnout with % calculated on education level:

Voter Turnout in 2020 by highest level of education completed
Education
turnout2020 1. Less than high school credential 2. High school credential 3. Some post-high school, no bachelor's degree 4. Bachelor's degree 5. Graduate degree
0. Did not vote 130 (41%) 286 (24%) 380 (15%) 135 (7%) 91 (6%)
1. Voted 185 (59%) 883 (76%) 2148 (85%) 1749 (93%) 1388 (94%)
Note:
Column % in parentheses

The results suggest a positive or direct relationship: as education increases, so does the % turnout.

Row/Column percentages

What happens if I calculate % among the values of the DV?

Here’s the relationship between education and voter turnout with % calculated across voter turnout

Voter Turnout in 2020 by highest level of education completed
Education
turnout2020 1. Less than high school credential 2. High school credential 3. Some post-high school, no bachelor's degree 4. Bachelor's degree 5. Graduate degree
0. Did not vote 130 (13%) 286 (28%) 380 (37%) 135 (13%) 91 (9%)
1. Voted 185 (3%) 883 (14%) 2148 (34%) 1749 (28%) 1388 (22%)
Note:
Row % in parentheses

Here, the results can give the misleading impression that there’s a curvilinear relationship: turnout drops off for Bachelor’s Degrees and above.

Row/Column percentages

Either of these tables might be a valid way to look at these data, but they answer slightly different questions:

  • If I want to compare turnout at different levels of education, then I need to calculate % turnout among people with different levels of education.

  • If I want to compare education among voters and non-voters, then I need to calculate % education among people who voted and didn’t vote.

  • Which variable is the IV or DV is sometimes a theoretical question, but in this case its unlikely that voting is causing people to become more educated, so it probably doesn’t make sense to calculate percentages by voting vs. non-voting.

Mean Comparison

When we have interval level outcome and a categorical independent variable, we can group each observation by values of the IV and then calculate the mean across each group.

For instance I want to examine the relationship between national wealth and carbon emissions. My hypothesis is that wealthier nations will have more emissions compared to poorer nations.

country gdp.percap.5cat co2.percap
Afghanistan 1. $3k or less 0.281803
Albania 3. $10k to $25k 1.936486
Algeria 3. $10k to $25k 3.988271
Angola 2. $3k to $10k 1.194668
Argentina 3. $10k to $25k 3.995881
Armenia 3. $10k to $25k 2.030401
Australia 5. $45k or more 16.308205
Austria 5. $45k or more 7.648816
Azerbaijan 3. $10k to $25k 3.962984
Bahrain 5. $45k or more 20.934996

Mean Comparison

GDP data has been grouped into five categories, so now I just need to calculate the average of CO2 emissions within each group of the ordinal IV:

GDP Per capita range CO2 emissions per capita
1. $3k or less 0.3128312
2. $3k to $10k 1.2680574
3. $10k to $25k 4.4065669
4. $25k to $45k 8.0307610
5. $45k or more 12.3134306

Is this generally consistent with expectations?

Mean Comparison

Here again, the relationship can be easier to conceptualize if we plot it.

Curvilinear Relationships

A relationship like this will rarely be perfectly straight, so “linearity” and “curvilinearity” are partly a matter of degree, but there are some cases where there is a clear “U” shape to the relationship:

iv dv
1. Extremely liberal 6.314
2. Liberal 5.685
3. Slightly liberal 5.001
4. Moderate; middle of the road 4.651
5. Slightly conservative 4.636
6. Conservative 4.974
7. Extremely conservative 5.363

Research Design

Rival Explanations

  • How can we distinguish correlation from causation?

  • This process inevitably requires us to consider rival explanations for an observed relationship:

    • For instance: if I find that social media use correlated with a lower likelihood of turning out to vote, I might ask whether age is a confounder that could explain this correlation.

Confounding

What I want to show is that Fox News viewership is cases a decreased chance of getting a Covid vaccine.

mrdag X Fox News Y Covid Vaccine X->Y

Confounding

There’s a correlation, but I’m concerned this relationship is spurious because I know that things like existing political views are already correlated with media consumption, and those might explain any correlation I see here:

mrdag Z Conservatism X Fox News Z->X Y Covid Vaccine Z->Y

Its possible that this difference in ideology accounts for the entire observed correlation between media habits and vaccines. I can’t really rule this possibility out without further investigation.

Confounding

What if I could randomly assign people to watch Fox News? Random assignment would ensure that nothing is correlated with Fox news viewership.

mrdag Z Conservatism Y Covid Vaccine Z->Y X Fox News X->Y

Ideology may still matter for getting a vaccine, but since if conservatism is randomly distributed between social media users and non-users, it no longer confounds the observed relationship.

Experiments

  • Experiments use random assignment to account for rival explanations. If you randomly assign people to receive a “treatment”, then you can ensure that there is no confounding because nothing is correlated with your IV.

  • The classic examples are in medicine:

    • Group A is randomly assigned to receive a placebo (the control group)

    • Group B is randomly assigned to receive a medicine (the treatment group)

    • After a certain period of time, we compare the outcomes for both groups.

    • Differences between the groups can be attributed to the effect of the treatment (+/- some random sampling error)

Experiments

  • Experiments are considered a “gold standard” because they can account for all kinds of confounding, including confounding caused by unobserved or unexpected relationships.

  • However, they have two key limitations:

    • External validity: results in the lab may not easily translate to results in real life.

    • Feasibility: many interesting questions just can’t be randomly assigned. We can assign “democracy” or “war” or “religion” to people.

Experiments: Field Experiments

Field experiments can lessen the external validity problem by using random assignment in the field.

For instance, one common way to study GOTV messaging is to randomly select households to receive mailers:

Civic duty treatment

Hawthorne treatment

Neighbors treatment Self treatment

From: GERBER, A. S., GREEN, D. P., & LARIMER, C. W. (2008). Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment. American Political Science Review, 102(1), 33–48. doi:10.1017/S000305540808009X

Experiments: Field Experiments

From: GERBER, A. S., GREEN, D. P., & LARIMER, C. W. (2008). Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment. American Political Science Review, 102(1), 33–48. doi:10.1017/S000305540808009X

Experiments: Natural Experiments

Field experiments can face fewer external validity problems, but some things still can’t be experimentally manipulated.

Natural experiments use “quasi” randomization or “randomization by nature” where treatments are assigned more-or-less randomly.

Experiments: Natural Experiments

  • Viewing Fox News isn’t random, but areas where Fox News is lower in the channel order will have more viewers.

  • Channel order is essentially randomly assigned.

  • So, using channel order as a “treatment” assignment might theoretically allow us to account for confounding in an observational setting.

Experiments: Natural Experiments

Other sources of quasi randomization include:

  • Lotteries (like the Vietnam Draft, or the literal lottery)

  • Arbitrary cutoffs (barely winning an election vs. barely losing)

  • Natural disasters and weather events

Still, natural experiments require a mixture of creativity and luck. They’re not available for most questions.

Observational Research

Experimental Research: “treatment” (the independent variable) is randomly assigned by the researcher in order to identify cause and effect relationships.

Quasi-Experiments: the independent variable is “randomly” assigned, but not by the researcher. For instance: a policy that is distributed by a random lottery.

Observational Research: nothing is randomly assigned, at least not by researchers. Phenomena are observed in the real world. This is easier to implement but its much harder distinguish correlations from true causation because lots of things are non-random.

Observational Research and sampling

  • This category includes qualitative methods, but we’re focusing mostly on quantitative (large-N) methods.

  • In quantitative studies of countries or states, we might not worry much about questions of “representativeness” because we can collect data on all of the countries or states.

  • But in quantitative research on people, its typically not feasible to study everyone. We’ll need to sample and then make generalizations.

Observational Research: Survey vs. Census

  • A true population census means studying everyone, or nearly everyone. This is rarely feasible except with a lot of resources!

  • Survey research aims to find a representative sample of the population and then make inferences about the population.

    • Representativeness a central concern here. If our samples aren’t representative, then they can’t be generalized to the population.
    • The use of surveys introduces problems of statistical inference and uncertainty. There is a chance that a random sample will be biased (just like a coin can land on tails 5 times in a row), but that risk is quantifiable and it becomes extremely small when we have a lot of observations.
    • I might not be able to tell you the exact % of the public approves of congress, but I can tell you the probability of being off by 20 percentage points in a 1,000 person survey.

The 1936 Literary Digest Poll

  • Literary Digest wanted to predict the outcome of the 1936 presidential election.

  • Mailed out approx. 10 million “ballots” based on address data from social clubs, automobile registrations, phone books etc. about 2.4 million were returned.

  • 10 million much larger than the target for most polls

  • A 24% response rate was low for the time, but higher than contemporary polls

  • Scientific polling was in its infancy, Literary Digest touted their lack of any fancy data manipulation as an advantage over polls that used more complex methodologies.

The 1936 Literary Digest poll

Prediction for Landon Prediction for Roosevelt
54% of the popular vote 41% of the popular vote

The 1936 Literary Digest poll

Not quite!

The 1936 Literary Digest poll

  • Contemporaneous analyses attributed the error to sampling bias: Literary Digest targeted car owners and telephone owners, and so the results skewed toward the wealthy

  • Subsequent reanalyses point to non-response bias: Roosevelt supporters were systematically less likely to return the postcard compared to Landon supporters.

    • Anecdotal evidence suggests that this may have been the result of different levels of enthusiasm: Landon voters really didn’t like Roosevelt and they were motivated to talk about it.

    • Sample size doesn’t fix bias! Much smaller random polls can easily outperform a large-yet-biased one.

Lusinchi D. “President” Landon and the 1936 Literary Digest Poll: Were Automobile and Telephone Owners to Blame? Social Science History. 2012;36(1):23-54. doi:10.1017/S014555320001035X

Lusinchi D. “President” Landon and the 1936 Literary Digest Poll: Were Automobile and Telephone Owners to Blame? Social Science History. 2012;36(1):23-54. doi:10.1017/S014555320001035X

Sampling strategies

A note that’s still relevant today: non-response bias is a big risk! Some people are more likely to take polls.

Sampling strategies

Contemporary polls often get stuff wrong, but many general election pollsters are able to get a lot closer with far fewer observations (even as conditions get worse!) How?

Source: fivethirtyeight.com

Sampling strategies: Simple Random Samples

  • Simple Random Samples: get a list of the entire target population, and start selecting at random.

  • Pros: we’ll converge toward representativeness +/- some random error. As we approach sample sizes of 1,000 or more, the probability of a very large random error becomes negligible.

  • Cons:

    • Where am I supposed to get a list of the entire population!?

    • Can be inefficient: we need a lot of data to get the margin of error to an acceptable level

    • What if I want to study public opinion among a small group? Say: convicted felons, LGBTQ people, or millionaires?

In the old days, you might use a book like this to generate random numbers. Now we can just do it with a computer.

Sampling strategies: Stratification and Clustering

  • Stratified Sampling divides the population into groups and then randomly samples within those groups.

    • For instance: I might group people by generation (Z, millennial, gen-X, etc.) and then randomly sample until I hit a target quota for generation.
  • Pros: Ensuring representativeness isn’t left entirely to chance, and I can even oversample certain groups if they’re hard to reach or rare in the population.

  • Cons: this isn’t pure random sampling, so we will need to re-weight the data to make it look like the actual population and account for this when calculating our margin of error.

Sampling strategies: Stratification and Clustering

  • Cluster Sampling: Instead of targeting individuals, I might target a geographic area like a household, or a city block, or a census tract.

  • Pros: Potentially much easier to target a random sample of something like households.

  • Cons: again, not a random sample, so we may need to do some re-weighting and account for this when calculating our margin of error.

Sampling strategies: sample weighting

  • Contemporary scientific polls like the ANES are typically stratified and clustered, mostly because of cost-efficiency: all else equal its much cheaper to get a representative sample using stratification and clustering.

  • Contemporary polls still have to contend with the problem of response bias as well. Some groups are systematically less likely to respond to polls.

  • But this is why you’ll often need to use weights when working with NES data: weights allow us to make a non-representative sample approximate a representative sample.

race N wtd N
1. White, non-Hispanic 5963 (73%) 5383 (66%)
2. Black, non-Hispanic 726 (9%) 935 (11%)
3. Hispanic 762 (9%) 1108 (14%)
4. Asian or Native Hawaiian/other Pacific Islander, non-Hispanic alone 284 (3%) 325 (4%)
5. Native American/Alaska Native or other race, non-Hispanic alone 172 (2%) 152 (2%)
6. Multiple races, non-Hispanic 271 (3%) 296 (4%)

Sampling strategies: sample weighting

  • The basic idea of weighting is simple: if you have twice as many white college educated respondents as you would expect for a representative sample, then you just make each white college educated response count as 1/2 an observation. We call this inverse probability weighting.

    • We know the expected numbers for lots of demographics because of U.S. census data, which isn’t a sample at all.
  • In practice, this can be complicated because we generally want to weight for lots of characteristics at once, or make inferences about target populations (ex: likely voters) whose actual size is not known beforehand. Differences of opinion in how to weight things accounts for a lot of the systematic differences across public opinion surveys that ask the same questions.

Sampling strategies: sample weighting

  • Something to keep in mind: weighting can go very wrong! Differences of opinion in how to weight things accounts for a lot of the systematic differences across public opinion surveys that ask the same questions.

  • Weighting can’t solve everything. Its much easier to account for things like small demographic differences compared to more complex problems like differential non-response. We know how many 18-25 year olds there are, but we don’t always know the number of people who are vulnerable to social desirability bias.

Sampling strategies: Non-Random Samples

  • Snowball Sampling is a method of surveying very small or hard-to-reach populations. Like surveys of the homeless, or drug users, or experts on medieval history. It works like a chain letter:

    • Survey some number of known members

    • Ask those members to give you the contact information for other members

    • Repeat N times.

  • There are no guarantees that a Snowball Sample won’t have the Literary Digest problem, its only viable as a research strategy because some groups are so hard to reach that they just can’t be studied any other way.

Sampling strategies: Convenience Samples

  • Convenience Samples are non-representative samples that simply target a population that is easy to access. (College students are a classic example)

  • Convenience Samples are not representative and don’t really aim to be. Instead, they’re often to used to do things like quickly test out a survey or experiment. Its unlikely they can be generalized to the population, even with re-weighting.

  • Our class survey will be something like a mixture of snowball and convenience sampling, so we’re won’t have representative data. We’ll ignore this problem, but its definitely something to keep in mind when assessing real-world polls!

Sampling strategies: What about non-survey data?

  • Data on countries or U.S. states seemingly aren’t really samples at all. So we don’t typically have to worry as much about issues of bias, but random error and bias will still come up.

  • Some sources of bias for cross-national data:

    • Countries may not keep complete records.

    • Some sources may disagree on whether some countries exist.

    • Some national statistics may be based on surveys that have their own biases.

  • For our purposes, we’ll treat data sets like the “states” and “world” data sets included with your workbook as representative samples from … a potentially infinite population of states or countries.

Some notes on polls

Observational data and control

  • Assuming we’ve collected good observational data: how do we control for confounding without randomization?
  • We’ll address this issue using controlled comparisons: essentially, we will split the data up by values of a potential confounder, and see whether the same general relationship still shows up among groups with the same values on the confounding variable.